The unified model revisited

نویسنده

Stephen Robertson

چکیده

This paper reconsiders the unified probabilistic model of information retrieval, proposed by the author with M.E. Maron and W.S. Cooper in 1982, as a reconciliation of the Maron & Kuhns and Robertson & Sparck Jones models. Some basic concepts of the unified model, such as documents, user needs, and terms as properties of these, are discussed and reformulated in the light of later work. The issue of the event space underlying the model is also re-assessed. An event space consisting of a Cartesian product of four random variables is proposed: two observed, the texts of the document and query, and two hidden, the models assumed to underly the texts. Relevance is seen as a derived random variable within this space. The product space should not, however, be flattened: its structure is important and must be retained. In this paper I will revisit some work done over 20 years ago, with M.E. (Bill) Maron and W.S. (Bill) Cooper (Robertson, Maron and Cooper, 1982; Robertson, Maron and Cooper, 1983). I will also consider some more recent developments. The two central, related issues are: (1) how we construe the basic objects of the IR space, their properties and relationships; and (2) how we interpret the notion of event space in this context, for the purpose of statistical models and experiments. 1 What was the problem? When I visited Berkeley in the spring of 1981, I discovered that Bill Maron and I had independently been thinking about (and getting nowhere with) the same problem. This was the possible relationship between Bill’s work (with J.L. Kuhns) from some 20 years previously, on a probabilistic model of indexing (Maron and Kuhns, 1960), and my more recent work (with Karen Sparck Jones), on a probabilistic model of searching (Robertson and Sparck Jones, 1976). Both models appeared to address the same question, how to assess the probability that a particular document will be judged as relevant to a particular user request by that user. The problem was that the two models seemed to address this question in different and apparently incompatible ways. We eventually reached a formulation of the problem, as follows. In both models we are concerned with the properties of documents or queries (for example the terms they contain). However, the two models have very different views of where these properties belong and how they might be used. In Maron and Kuhns’ model (referred to as Model 1), the human indexer is supposed to have a specific document in front of him/her, and to be imagining the kinds of users who might find this document useful. He or she has no knowledge of individual users or their queries, but knows something about the kinds of users the retrieval service might expect to have as clients, and how they ask questions. Thus the terms are assumed to characterise the users and their requests – this association is regarded as a fixed point, which the indexer cannot affect or change. Then the indexer’s function is to index (represent) a document in ways that will match well those queries put by users who will find this document relevant. By contrast, the Robertson/Sparck Jones model (Model 2) starts from the searcher end. Documents are assumed to be already indexed, and the searcher-end task is to represent queries in ways that will match well those documents that the user will find relevant. Here there is no direct view of individual documents – only of documents characterised by their predefined properties. The difference becomes more evident when we consider the possibility of relevance feedback. In Model 1 we might use known relevance judgements to modify the indexing of specific documents, so as to improve their representations for future users. In Model 2, we may use relevance judgements to modify the query, for future searches for the same user information need. Neither model is capable of dealing with the other form of feedback. Each model relies on an assumed fixed point to optimise something else which it takes as variable. The obvious next step to consider is to assume no fixed points, but to treat both as variable and to optimise both. But at least in the naı̈ve version of this idea, losing both fixed points means that everything is lost – ”properties” which do not characterise anything cannot be said to be properties, and must be regarded as undefined. 2 Objects and properties

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High level Ab inito bench mark computaions on weak interactions (H2)2 dimer revisited

The Potential Energy Surface PES of (H2)2 dimer has been investigated, using five simple rigid rotor models. These models are called: head to head, symmetric side to side, L , steplike and T model. All calculations were done at two levels of ab initio methods: MP2(Full) and QCISD (T,Full) using cc-pVTZ basis set at singlet state of spin multiplicity. The results of scanning PES were then fitte...

متن کامل

A UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS

In this paper we present a unified (probabilistic/possibilistic) model for resource-constrained project scheduling problem (RCPSP) with uncertain activity durations and a concept of a heuristic approach connected to the theoretical model. It is shown that the uncertainty management can be built into any heuristic algorithm developed to solve RCPSP with deterministic activity durations. The esse...

متن کامل

Cyclic Behavior of Beams Based on the Chaboche Unified Viscoplastic Model

In this paper, ratcheting behavior of beams subjected to mechanical cyclic loads at elevated temperature, using the rate dependent Chaboche unified viscoplastic model with combined kinematic and isotropic hardening theory of plasticity, is investigated. A precise and general numerical scheme, using the incremental method of solution, is developed to obtain the cyclic inelastic creep and plastic...

متن کامل

Electron in the Einstein-weyl Space

The classical unified theory of Weyl is revisited. The possibility of stable extended electron model in the Einstein-Weyl space is suggested .

متن کامل

Sliding Singlet Mechanism Revisited

We show that the unification of the doublet Higgs in the standard model (SM) and the Higgs to break the grand unified theory (GUT) group stabilizes the sliding singlet mechanism which can solve the doublet-triplet (DT) splitting problem. And we generalize this attractive mechanism to apply it to many unified scenarios. In this paper, we try to build various concrete E6 unified models by using t...

متن کامل

Sweep Line Algorithm for Convex Hull Revisited

Convex hull of some given points is the intersection of all convex sets containing them. It is used as primary structure in many other problems in computational geometry and other areas like image processing, model identification, geographical data systems, and triangular computation of a set of points and so on. Computing the convex hull of a set of point is one of the most fundamental and imp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

The unified model revisited

نویسنده

چکیده

منابع مشابه

High level Ab inito bench mark computaions on weak interactions (H2)2 dimer revisited

A UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS

Cyclic Behavior of Beams Based on the Chaboche Unified Viscoplastic Model

Electron in the Einstein-weyl Space

Sliding Singlet Mechanism Revisited

Sweep Line Algorithm for Convex Hull Revisited

عنوان ژورنال:

اشتراک گذاری